Privacy vs. Utility in Anonymized Data
نویسندگان
چکیده
We investigate the privacy and utility aspects of k-anonymity, which has received much research attention since its introduction in [Sweeney, 2002]. Meyerson and Williams [2004] showed that finding an optimal k-anonymization is NP-hard and developed a first approximation algorithm. Further algorithms with different approximation guarantees have been proposed, but it remains hard to compare these algorithms. We are interested in an algorithm that can give good privacy guarantees while preserving the utility of the data. Explicitly capturing the ideas of privacy and utility in this context seems hard and has not been fully achieved yet. We implemented the k-anonymization algorithm presented in [LeFevre et al., 2006], and evaluated its performance on three different real-world databases. We describe the algorithm and some implementation details in section 2. In section 3, we present some known utility measures on anonymized databases. These measures are quite general and do not capture many aspects of utility accurately. In the same section, we present a workload specific measure that proves to be very helpful in evaluating the usefulness of different anonymizations. However, workload specific measures are only helpful if the publisher of the database has sufficient knowledge about the expected queries. Furthermore, we discuss several heuristics to adapt the anonymization algorithm to an expected workload in section 4, and present the results of our evaluation in section 5. In the appendix, we summarize our individual contributions to this project, and give the complete source code of our anonymization and query program.
منابع مشابه
An Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملA Survey of Utility-based Privacy-Preserving Data Transformation Methods
As a serious concern in data publishing and analysis, privacy preserving data processing has received a lot of attention. Privacy preservation often leads to information loss. Consequently, we want to minimize utility loss as long as the privacy is preserved. In this chapter, we survey the utility-based privacy preservation methods systematically. We first briefly discuss the privacy models and...
متن کاملSmall Data Privacy Protection: An Exploration of the Utility of Anonymized Data of People with Rare Diseases
Sociotechnical researchers have recently begun studying people with rare diseases. There is potential for impact if data can be anonymized and shared so additional research can take place. However, this data also presents a high risk of re-identification because of the rarity of the diseases. Using existing research on data protection techniques, we generate an anonymized version of a rare dise...
متن کاملUtility-preserving anonymization for health data publishing
BACKGROUND Publishing raw electronic health records (EHRs) may be considered as a breach of the privacy of individuals because they usually contain sensitive information. A common practice for the privacy-preserving data publishing is to anonymize the data before publishing, and thus satisfy privacy models such as k-anonymity. Among various anonymization techniques, generalization is the most c...
متن کاملSecure Distributed Data Anonymization and Integration with m-Privacy
In this paper, we study the collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. We consider a new type of “insider attack” by colluding data providers who may use their own data records (a subset of the overall data) to infer the data records contributed by other data providers. The paper addresses this new threat, and makes several co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005